This report is based on the analysis done on my Facebook data extracted through Facebook Graph API and then cleaned and wrangled to create appropriate datasets using Python and R.
## YEAR Mean_Likes Median_Likes Mean_Comments
## Min. :2011 Min. : 4.00 Min. : 4.000 Min. :0.000
## 1st Qu.:2012 1st Qu.: 7.00 1st Qu.: 4.000 1st Qu.:1.000
## Median :2014 Median :13.00 Median : 8.000 Median :2.000
## Mean :2014 Mean :16.57 Mean : 9.571 Mean :2.429
## 3rd Qu.:2016 3rd Qu.:27.50 3rd Qu.:14.000 3rd Qu.:4.000
## Max. :2017 Max. :30.00 Max. :19.000 Max. :5.000
## Median_Comments Likes Comments Posts
## Min. :0.0000 Min. : 62 Min. : 12.0 Min. : 14.0
## 1st Qu.:0.0000 1st Qu.:1002 1st Qu.:140.0 1st Qu.: 72.5
## Median :0.0000 Median :2856 Median :186.0 Median :125.0
## Mean :0.8571 Mean :2314 Mean :316.6 Mean :151.1
## 3rd Qu.:1.5000 3rd Qu.:3373 3rd Qu.:463.0 3rd Qu.:183.0
## Max. :3.0000 Max. :4534 Max. :812.0 Max. :408.0
## TYPE_VIDEO TYPE_LINK TYPE_PHOTO TYPE_STATUS
## Min. : 0.00 Min. : 0.00 Min. : 5.00 Min. : 9.00
## 1st Qu.: 0.50 1st Qu.:11.00 1st Qu.:33.50 1st Qu.: 11.00
## Median : 1.00 Median :23.00 Median :49.00 Median : 18.00
## Mean : 53.57 Mean :19.14 Mean :42.57 Mean : 35.71
## 3rd Qu.: 26.50 3rd Qu.:27.00 3rd Qu.:56.50 3rd Qu.: 38.50
## Max. :320.00 Max. :35.00 Max. :64.00 Max. :124.00
## 'data.frame': 7 obs. of 12 variables:
## $ YEAR : int 2011 2012 2013 2014 2015 2016 2017
## $ Mean_Likes : int 4 5 13 25 30 30 9
## $ Median_Likes : int 4 4 8 19 19 9 4
## $ Mean_Comments : int 1 1 4 5 4 2 0
## $ Median_Comments: int 0 0 1 3 2 0 0
## $ Likes : int 62 506 2856 3162 1497 4534 3584
## $ Comments : int 12 132 812 661 186 265 148
## $ Posts : int 14 95 217 125 50 149 408
## $ TYPE_VIDEO : int 0 0 1 1 6 47 320
## $ TYPE_LINK : int 0 16 28 23 6 35 26
## $ TYPE_PHOTO : int 5 47 64 55 20 58 49
## $ TYPE_STATUS : int 9 32 124 45 18 9 13
There have been many insights from the Facebook data. The important ones are present below:
## 'data.frame': 1075 obs. of 18 variables:
## $ ID : Factor w/ 1073 levels "1405174859572227_1000278100061907",..: 775 775 572 586 585 583 580 579 578 576 ...
## $ DAY : int 31 31 30 29 24 8 25 25 25 19 ...
## $ MONTH : int 12 12 12 12 12 12 10 10 10 10 ...
## $ YEAR : int 2010 2010 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ DATE : Factor w/ 577 levels "1/1/13","1/1/15",..: 171 171 169 164 158 179 85 85 85 74 ...
## $ HOUR : int 20 20 5 3 9 7 8 8 8 9 ...
## $ MIN : int 0 0 29 54 33 59 40 36 7 39 ...
## $ SEC : int 0 0 49 29 4 4 52 27 44 43 ...
## $ TIME : Factor w/ 1060 levels "00:00:59","00:02:16",..: 925 925 166 88 322 279 305 303 285 324 ...
## $ TYPE : Factor w/ 5 levels "link","note",..: 1 1 3 3 4 3 4 3 4 4 ...
## $ LIKES : int 0 0 5 1 5 0 0 3 6 3 ...
## $ COMMENTS : int 0 0 0 0 1 0 0 0 2 3 ...
## $ TOTAL_WORDS: int 0 0 6 0 218 0 0 0 46 46 ...
## $ POS : num 0 0 0.667 0 0.045 0 0 0 0.355 0.213 ...
## $ NEG : num 0 0 0 0 0.138 0 0 0 0.041 0 ...
## $ NEU : num 0 0 0.333 0 0.817 0 0 0 0.604 0.787 ...
## $ COMP : num 0 0 0.612 0 -0.964 0 0 0 0.96 0.852 ...
## $ UNIQ_WORDS : int 0 0 0 0 94 0 0 0 22 24 ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 2.085 2.000 70.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 3.00 6.00 15.38 15.00 233.00
POSTS are divided in 4 categories based on #WORDS:
##
## Vlow Low Med High
## 757 113 43 162
Most of the Posts fall in the Vlow category, followed by High category.
Importantly, there has been a gradual increase in the Video Status being shared with the years:
It is clear that there is a BiModal Distribution of Posts with Hour of the day:
Note: Although Posts with Words_Level: High are extremely positive but with the increase in TOTAL_WORDS there is a decline in Positivity.
The trends below provide some more insights into the dataset and have been useful enough to reach at the conclusions and insights above.
The #Posts increase gradually in the beginning followed by a similar decline in year 2014 and 2015. It begins to catch up in year 2016 but it is steepest in Year 2017! There is still 4 months in the year 2017!
The Mean Comments and Mean Likes also follow the similar relationship as earlier, with Mean_Comments approx. 9 times the Mean_Likes for the year 2016.
From the graph it is clear that the posts from Latest Years are the ones to garner more no. of likes as compared to the posts to have received high number of Comments which are from Early Years!
Except for the Year 2012 and 2017, Total_Words being used in the Posts across all the years have been approximately the same falling somewhere in the category of <25 words Vlow Words_Level.
Except for the Vlow category of words, all the other categories seem to have same number of Median Likes.
The steep increase in the number of Videos being shared in year 2017 is clearly visible.
As per the Latest Statistics, Posts of type Photo generate MOST Likes followed by Status.
The median Comments are highest for High Words_Level which is similar to the distribution of Likes.
Let’s try to find the reason behind the uniform comment trends in year 2014!
For Year 2014, the Posts of Type Status/Photo are the ones dominantly present across Words_Level. Since, these two TYPE generate most Likes/Comments, hence the Comment Trends in 2014.
UNIQ_WORDS -> #Words that remain after removing the common words also known as StopWords.
From the Plot above, there is an almost Normal Distribution of the Uniq_Ratio, with the peak occuring at 0.45 approx. It means that no of UNIQ_WORDS for posts were about 45% of the TOTAL_WORDS!